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Chapter 1 
Introduction 


Cooperative control systems are increasingly emerging as significant alternatives to their 
centralized counterparts recently. The rising interest in deploying cooperative systems is fu¬ 
eled by the development of decentralized systems with low cost and performance advantages. 
For example, mobile exploration and information gathering tasks can often be accomplished 



Figure 1.1: A swarm of robots are expected to explore unknown planets. 

cheaply and more reliably by swarms of small autonomous robots as opposed to a single 
more sophisticated one. Cooperative control is also applied in many tasks that can not 
be performed by a single system, e.g. satellite arrays that enable global communication, 
geographically remote systems that communicate via network and others. 

The goal of our research is to investigate optimal control in cooperative systems, using 
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algorithms inspired from biology. We begin with a review of collective behavior in biological 
systems. 

1.1 Cooperative Biological Systems 

Animal aggregation is a common phenomenon in nature, seen in organisms that range in 
complexity from primal zooplanktons to advanced mammals. Many species exhibit collective 
movement patterns which are highly organized, compared to the seemingly random individual 
behaviors. For example, a school of fish can move together in a tight formation and respond 
almost as fast as a single organism to evade encountering dangers. Worker honey bees can 
distribute themselves to different nectar sources in accordance with the profitability of each 
source. Ants can recruit their nest-mates to form a trail along the most efficient path between 
the nest and food when foraging [1, 2]. 

The above examples show that aggregate behaviors in these animals may have special 
group-level properties that go beyond the ability of an individual. Certainly, if all group 
members’ behaviors are coordinated by a centralized “leader”, the leader must have the ca¬ 
pabilities to communicate with others and alter their behaviors. Observing the qualitatively 
identical behaviors of all members in an insect aggregate as well as their physical limitations, 
we can conclude that there are no such leaders in these groups (and this is supported by 
other research [1, 2, 4]). Therefore, some of the awe-inspiring group behaviors in nature 
come about as the results of individuals’ self-organized actions. For instance, at the individ¬ 
ual level honey bees receive limited information from other workmates and go to forage the 
selected flowers. This type of behavior seems to lead to random distribution over different 
sources because the message each bee obtained does not convey to it accurate information 
about the profitability of each nectar source. At the group level, however, it is amazing 
to see that foragers are rationally dispatched over different flowers in accordance with the 
distribution of nectar over various sources. Coordinating a colony of bees to achieve such a 
complicated collective behavior seems very difficult for any individual bee. A reasonable ex¬ 
planation is that bees only follow some simple rules all through the foraging activities while 
the collective behavior turns out to be highly organized [1]. In conclusion, the individual 
behavior is an “unsophisticated” one due to the individuals’ physical limitations, in contrast 
to the complex performance of the whole group. This fact implies that there seems to ex¬ 
ist an intrinsic mechanism among insect aggregates that overcomes individuals’ drawbacks 
and yields results that might be impossible for individuals to attain. It is the cooperation 
between group members, i.e. the rule that each individual complies with, that yields group 
patterns 1 qualitatively different from and more elegant than those of individual behaviors. 

1 We use “group pattern” to refer to the collective movement pattern of a group. 
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We have seen that biological systems, especially social insects, demonstrate many promis¬ 
ing cooperative solutions to complicated tasks. Many of these tasks are similar (at least 
functionally) to what one might want to do with cooperative engineered collectives. In ad¬ 
dition, individual members in a biological collective are similar to the units of a cooperative 
control system in the sense that they are equipped with limited capabilities of sensing, com¬ 
municating and computing. What we are essentially interested in is trading off individual 
capability of cooperating in order to achieve a complex task with less sophisticated equip¬ 
ment: low power, short sensing range and low communication burden, looking to natural 
examples - like that ants are able to find the most efficient path while individual ants are 
of short sight and low intellect - for successful prototypes. Natural systems have developed 
such capabilities to solve various problems through evolution and natural selection, and may 
offer us some clues on how to proceed [1, 26]. 


1.2 Research Objectives 

The objective of this work is to investigate the cooperative solution of a class of optimal 
control problems using groups of agents 2 with limited sensing and computing capabilities. 
Our approach will be to postulate rules for individual behavior, inspired from observations 
of biological systems, and then investigate the “group pattern” that emerges. Rules for 
individual agents will be obtained by: 

1. Constructing a proper model for the observed collective movement patterns of certain 
biological systems, including ant colonies. An effective model will allow us to capture 
some aspects of the “experience” accumulated through natural selection. 

2. Extracting simple “rules” that capture individual behavior within the group. These 
rules should be kept simple, with respect to the computation and communication 
resources required to implement them, to be applied to cooperative control systems, 
such as cheap autonomous robots. 

3. Exploring how these rules can be applied to artificial collectives in order to solve 
optimization control problems that are hard or impossible for an individual to solve. 
This will involve combing existing methods on optimal control with the specified rules. 

throughout the document we will use “agent” to refer to a member of a group of control systems. 
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1.3 Outline 


The rest of this paper is organized as follows: in Chapter 2 we will review various recent 
research directions of cooperative systems. A class of algorithms for cooperative optimal 
control inspired from the observed movements of ant colonies will be introduced in Chapter 
3, along with a discussion of the algorithms’ potential advantages. Chapter 4 presents some 
current progress concerning the proposed algorithms, including convergence analysis, special 
cases and numerical experiments. Finally, the ongoing work and an outline of possible 
approaches for its completion are given in Chapter 5. 
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Chapter 2 

Literature Review 


The potential of a cooperating group to “do better than the sum of its parts” has already 
seeded a variety of recent research directions in engineering, from modeling of animal groups 
[1, 24, 25, 26], to distributed collective covering and searching [28, 29], estimating by groups 
[30, 31, 32], cooperative robotic teams [33, 34, 42] and biologically-motivated optimization 
[36, 27]. These works typically treat narrowly defined problems [36, 32, 27], discuss only 
the feasibility of special tasks [33, 34, 42], or show the effectiveness instead of optimality 
of various proposed algorithms [28, 29], Here, we review some of these and other relevant 
works. 

2.1 Animal Group Pattern Modeling 

The work of [24] proposed a simple model concerning the movement of n autonomous agents 
with the same speed but with varying headings. If each agent of a group uses the “nearest 
neighbor rule” to update its heading, that is 



where 8i(t) is the heading of the i th agent and n l {t) is the number of neighbors of the i th 


agent at time t, then all agents’ headings will converge to a common constant vector as time 
goes on. The theoretical explanation for the convergence described in the above model is 
provided in [25], along with several similar models inspired by [24], such as “leader following” 
showing that if there exists an agent acting as the “leader” in the group, all agents will evolve 
to point to the same heading as the leader . This “nearest neighbor rule” can cause all the 
members of a group to move towards the same direction despite the fact that there is no 
centralized coordination and that an agent’s set of nearest neighbors might change as the 
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system evolves. The models developed by [24, 25] have been used to explain how a group of 
birds or fish manage to move in tight formation as a single entity. 



Figure 2.1: The flow diagram illustrates the model of how honey bees allocate the foragers. 

Another mathematical model is constructed in [1] to describe the foraging activities of 
worker honey bees. Each honey bee complies with certain rules to determine where it will 
go to forage. This process is described by a flow diagram illustrated in Fig. 2.1. At the 
bifurcations on the diagram, honey bees make decisions on which nectar source to forage 
and whether to dance - the way honey bees transfer information - or not. The decision¬ 
making process is modeled as the probabilities of proceeding various actions. For example, 
Px represents the probability for one bee to watch other dancers after it unloads the nectar 
collected from flower A, P/(1 — Px) represents the probability of dancing for the flower A and 
Pp represents the probability of following other dancers to forage flower A. Noticing that 
honey bees make decisions only after receiving limited information from their workmates, [1] 
proposed a set of simple equations to describe these probabilities, e.g. 

pA _ Da<Ia 

F D A (Ia + D B ds 

where D A represent the number of dancers for flower A and d A is the proportion of time that 
foragers actually dance. Other probabilities such as can be assumed to be a constant. 
Simulations showed a collective result that was qualitatively similar to what is observed in 
real bee colonies. 
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2.2 Models of Ant-Trail Formation 


One of the awe-inspiring phenomena in nature is the foraging activity of ant colonies, which 
includes discovering foods, recruiting nest-mates and forming trails. When an ant finds 
food, it will recruit other ants around to convey food back to the nest. These co-workers will 
rapidly form a well-defined trail between the nest and food although they are homogeneously 
distributed at first. Finding an efficient line between the nest and food seems too complicated 
a problem for an individual ant to solve, especially if one considers the ant’s tiny size relatively 
to obstacles in the environment, such as stones, stick and crevices. Nonetheless, a colony of 
ants seem to always be able to complete this task [1], To explore the intrinsic mechanism 
that leads to the collective efficiency as opposed to individual clumsiness, several models 
concerning ant-trail formation have been proposed. 

The work of [1] described a model about how ants utilize pheromonal secretions to 
choose ongoing pathways. According to this model, pheromonal secretions are laid along the 
paths by ants to keep a trace and recruit other nest-mates. At the same time, pheromonal 
secretions evaporate as time goes on. When an ant comes to a location where several traces 
cross, it will try to follow the path with the highest concentration. As illustrated in Fig. 2.2, 



Figure 2.2: An ant chooses the path in accordance with pheromone concentrations 


the probability of taking the left branch of a “fork” in the terrain is quantified as 

p (k+c L r 
L (k + c L y + {k + c R y 


The parameters Cl and Cr represent the pheromone concentrations on the left and right 
branch, n and k are constants corresponding to the degree of nonlinearity of the choice and 
the attraction of an unmarked branch, respectively. The key point is that the pheromonal 
secretions play a “positive feedback” role. Although an individual ant knows little about the 
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entire environment and the distribution of its co-workers, simulations show that the colony 
has the collective potential to find the shortest path. 

Another model concerning ant-trail formation on a plane was explored in [26]. The 
basic rule in this model is that each ant “follows” one of its co-workers instead of measuring 
pheromonal secretions, as Fig. 2.3 illustrates. In [26], the pheromonal secretions laid by an 
ant are used to trace its own tail and find its way back to the nest but not to recruit its 
nest-mates. Paraphrasing [26], the path traveled by a single ant is a curve x k (t) : [0, T] —> M 2 





Figure 2.3: Ants find the shortest path joining two members 


with x k = u(x k ) (u G M 2 ). The boundary conditions for these systems are x 0 (0) = x 0 and 
x 0 (T) = Xf, which represent the starting point (nest) and the target point (food) respectively. 
Any ant can trace its own trajectory back to the nest so that we have a sequence of ants 
departing from x 0 . Each ant moves with unit speed and there are A units of time between 
the departure time of successive ants. At every instance, each ant except the first one will 
follow its predecessor by pointing its speed vector in a straight line toward the predecessor. 
In short, for k — 1,2, 3 ... 


x k (t) = 


x k -i(t) -x k {t) 
\\x k -i(t) -x k (t)\\ 


with Xfc(0) = x 0 for t < kA and x k (T) = Xf if x k reaches the target Xf. For the case when 
x k (t ) el 2 , it has been shown that if the initial ant x 0 (t) has access to a sub-optimal path 
from x 0 to Xf, then the trajectories {x k } will converge to a straight line connecting x 0 and 

Xf. 


2.3 Distributed Covering and Searching 

Inspired by the fact that ants and other insects use pheromones for various communication 
and coordination tasks, [28] developed robust adaptive algorithms to perform tasks requiring 



the traversal over an unknown region, such as cleaning the floor of an unmapped building. 
The region to be covered is described by a graph G = (V. E), where every vertex represents 
an “atomic region” (tile). When agents deployed in the algorithms are traveling on G, 
they mark the trails by depositing a pheromone, which evaporates as time goes on. By this 
mechanism, the agents can assign each edge of the graph, which represents the neighborhood 
relation between two “atoms”, with a label of the time that implies the most recent traversal 
of that edge. An agent visiting vertex u e V(G) checks the labels on all edges emanating 
from u, thereafter it goes the direction that was not visited for the longest time by choosing 
the smallest label. The time t k needed to cover all edges of the graph by k agents under the 
“ANT-WALK-1” rule based on the above idea is bounded as 



where A is the maximum vertex degree in G, n = |V(G)|, a is related to the measurement 
noise and p(G) is the cut-resistant of G. In the same work, the “ANT-WALK-2” rule, a 
generalization of the famous Depth-First search algorithm, was developed for agents with 
limited amounts of memory. The time t, k for this rule is bounded as 



where the notation is as before. 

The work of [29] investigated the performance of cooperative strategies that control 
autonomous air vehicles searching a dynamic environment to gather information. The pro¬ 
posed framework considers two main components for each agent: distributed learning of the 
environment and distributed path planning based on the information gathered. The collec¬ 
tive results based on a recursive g-step ahead as well as an interleaved planning technique 
illustrate that the cooperation among vehicles improves the performance. The authors also 
explored the feasibility of developing coordination control strategies inspired by the social 
foraging activities of E. coli , a common type of bacteria. 

2.4 Distributed Localization and Estimation 

The study of [30, 31] proposed a method called “Cooperative Positioning System (CPS)” 
to aleviate the weakness of traditional position identification techniques usually applied in 
robotics, including dead reckoning and landmark. In that work, a robot group is divided into 
two teams in order to provide “portable landmarks”. At every instance, one team moves 
and the other stays static, acting as the landmark, then they exchange roles. Therefore, 
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each team can benefit from accurate measurement by utilizing static landmarks, while at 
the same time, no prior placing of landmarks is required. The drawback is that at least one 
robot must stay stationary so that the overall speed of the algorithm is restricted. 

Another approach has been presented in [32] to simultaneously localize a group of mobile 
robots with respect to the others’ positions. Each robot measures its own motion using its 
proprioceptive sensors. When two robots Xi,Xj meet, they will share information with one 
another, then the i th robot will update the estimate of its own position with respect to 
the j th robot’s and the relative distance estimate between the two robots. The proceeding 
propagation and update are described by the Kalman filter equations in [32], This method 
distributes what would be a centralized estimation process among M Kalman filters, each 
of them operating on a different robot. 


2.5 Group Formations 

The work of [33] derived a framework that allows robots equipped with range sensors to con¬ 
trol their states in order to accomplish the searching or rescuing manipulations. The authors 
derived three formation controls - “Separation-Bearing Control”, “Separation-Separation 
Control” and “Separation Distance-To-Obstacle Control” - with respect to neighboring 
robots or obstacles in the environment. The “basic formation” framework is constructed 
using the above formation controls and is proved to be able to stabilize the formation of a 
robot team. Lastly, that work outlined a coordination strategy allowing switches between 
control policies for maintaining the formation in situations with constraints on the sensors, 
actuators and the environment. 

A smooth time-varying feedback control law is developed in [34] to organize formations 
of multiple nonholonomic wheeled mobile robots. Each robot Ri senses the relative positions 
of its neighboring robots in its own coordinate system £$. The formation control is described 
by a vector called “formation vector”. Because it is hard to obtain asymptotically stable 
performance for robots with nonholonomic constraints via smooth static-state feedback con¬ 
trols, the authors utilized a time-varying feedback control law to get the desired velocity 
for each agent. Using an analytical method based on averaging theory, the group formation 
under this mechanism is proved to be asymptotically stable. 

Another coordinate strategy for vehicle group maneuvers, including translation, rota¬ 
tion, expansion and contraction, is presented in [42] through the construction of artificial 
potentials and virtual leaders. The control applied on each vehicle is defined as the linear 
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combination of the gradient of these potentials as well as a linear damping term: 


N 


M 


V** V lM - VxMhik) - K±i 


where U{ = x t is the control, x l3 is the distance between the i th vehicle and the j th vehicle and 
h ik is the distance between the i th vehicle and the virtual leader k. The artihcial potentials 
Vi deploy attraction to distant neighbors as well as repulsion for neighbors too close. The 
accomplishment of desired mission is through controlling the direction of virtual leaders’ 
motion, while the speed of the virtual leaders is to ensure the convergence of the formation. 
The convergence property is proved by Lyapunov’s method. 

2.6 Biologically-Motivated Optimization 

The work of [36] introduced a search methodology based on the “distributed autocatalytic 
process” to solve a classical optimization problem - the Traveling Salesman Problem (TSP). 
Inspired from the fact that ants can use pheromonal secretions to find the shortest path 
when foraging, [36] utilized an ant team to travel through the towns in TSP. The transition 
probability from town i to town j for the k th ant is defined as 



( 2 . 1 ) 


where is the set of towns reachable by k and T l3 (t) is the intensity of pheromonal trail 
on edge (i,j) at time t, which is laid by ants on the edge and evaporates as time goes on. 
The visibility of the path, 77 ^, is defined as the reciprocal of the distance between the town 
i and town j, dij , i.e. rj l3 = 1/dij. Lastly, a and f3 are parameters evaluating the relative 
importance of the trail and the visibility, respectively. Based on Eq. (2.1), [36] developed 
three algorithms: “ant-cycle”,“ant density” and “ant-quantity”, each based on a slightly 
different rules by which ants update the r^(t) along their trails. The trajectories of the ant 
team in each algorithm eventually converges to the optimal tour for the TSP. 

The “probabilistic pursuit” algorithm for a group of agents moving on a planar grid was 
presented in [27]. Briefly, a sequence of agents Aq,A\,... are moving from the origin at time 
t = 0, A, 2A,... to a destination. While moving on the grid, A n+ i “chases” A n by making 
a random choice of a neighboring grid point and moving there. The probability distribution 
that defines the agent’s choice is determined by its relative position to its predecessor, that 
is 


A n +i(t + 1 ) — A n+ i(t) + 5 n+ \(t + 1) 
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where 6 n +i(t + 1) E {1, —1 ,j, —j} and 


Prob{<5 n+ i(£ + 1) = sign(4)} = 

Prob{<J n+ i(t + 1) = j ■ sign((4)} = 

where A n (t) is the position of the n th agent at time t, d = \\d x \\ + ||d, y || and d x , d y are 
relative distances between A n and A n+1 at the x and y directions, respectively. Analytical 
and simulations show that the average trajectories of agents converge to a straight line on 
the plane. This work is related to the problem of discovering optimal trajectories that will 
be the focus of this research. It is of course restricted to a discretized plane. 
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Chapter 3 

Biologically Inspired Algorithms for 
Optimal Control 


In this chapter we introduce a class of algorithms inspired by ant-trail formation and discuss 
their potential advantages. Recall that there are already several effective models of ant-trail 
formation [1, 26], which explained how a colony of ants find the shortest path length between 
two points and already seeded some applications [36, 28]. We are particularly interested in 
the simplicity of the model in [26]. However, [26] only applies to a very narrow domain 
(K 2 with holonomic, kinematic vehicles). We would like to expand it to a much broader 
class of optimization problems, including many classical problems in optimal control. Before 
proceeding with the algorithms, we describe the precise problems we are concerned with. 


3.1 Problem Statement and Notation 

For our purposes, the agents are assumed to be a number of “copies” of a dynamical system, 
i.e. for A; = 0,1,2... 

x k = f(x k , u k ) x k (t) € M n , u k (t) 6flcr (3.1) 

Physically, each copy of Eq. (3.1) could stand for a robot, UAV or other autonomous system. 

What we discuss here are some classical trajectory optimization problems for systems 
evolving under Eq. (3.1) with fixed end points. Each function x k (t) : [0, T] —> M n represents 
a trajectory defined by the agent’s movement. For simplicity, let us start with the problem 
with fixed final time. 
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Fixed Final Time Problems 


Assume the starting state x 0 and target state Xf are equilibrium points of Eq. (3.1) for 
u = 0 1 , i.e. 


x k (t) = f(x k (t), 0) = 0 if x k {t) 6 {x 0 , x f } 

The problem we are concerned with is finding a trajectory x*(t) that minimizes the cost 
function 

rto+T 

J(x,x,t 0 ,T)= / g(x(t),x(t),t)dt (3.2) 

Jt 0 

with a; (t 0 ) = x 0 , x(t 0 + T) = Xf and subject to x = f(x, u ). 

The cost function could apply in various categories of optimal control problems, e.g. 
g{x{t),x{t),t) = |J#|) (length minimization). 

Let B C I" be a domain containing states a and b. Assume 0 < a < T and f 0 > 0. The 
optimal trajectory from a to b in fixed T units of time is defined to be x*(t) (t <E [t 0 , t 0 + T]) 
satisfying: 

J(x*, x*, t 0 , T) = min J(x,x,t 0 ,T) subject to x(t 0 ) — a, x(t 0 + T) = b (3.3) 

For notational convenience, we define the cost of following x*(t) for a units of time as: 

rj(a,b,T,t 0 ,cr) = f g(x*(t),x*(t),t)dt a <T (3.4) 

Jt o 

where the optimal trajectory x*(t) is defined in Eq. (3.3). 

For a generic trajectory x(t), we define 

rto+c 

C(x,t 0 ,cr)= / g(x(t),x(t),t)dt (3.5) 

J to 

to be the cost incurred along x(t) during [t 0 , t 0 + S). 

Free Final Time Problems 

Consider a class of optimal control problems with free final time (such as minimum-time 
control), where we are trying to find a trajectory x*(t ) and a best final time T > 0 that 
minimize the cost function 

/ io+r 

g(x(t),x(t),t)dt (3.6) 

j 

Otherwise we can assume there exist uq and Uf such that /(x'o, mq) = f(xf,Uf) = 0. 
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with the restriction that x(t 0 ) = x 0 ,x(t 0 + T) — Xf and F > 0. The cost of the optimal 
trajectory x*(t) (t e [t 0 ,t 0 + r]) from a to b is defined as: 

Jf(x*, x*, t 0 ) = min Jf{x, x, t 0 ) with x(t 0 ) = a, x(t 0 + T) = b over all T > 0 (3.7) 

The cost of following the optimal trajectory for a units of time is defined as 

r) F (a,b,t 0 ,cr) = f g(x*(t),x*(t) : t)dt 5 < V (3.8) 

Jt 0 

where x*(t ) is dehned in Eq. (3.7). 

3.2 A Class of Bio-Inspired Pursuit Algorithms 

In the model of ant-trail formation described in [26], each ant is trying to catch up its 



Figure 3.1: A geodesic discovery process on a plane. 

predecessor on R 2 in the most “efficient” way, namely by pointing its velocity vector towards 
its predecessor. The trajectories generated by the movements of ants are gradually optimized 
and the trajectory sequence converges to a straight line on M 2 , as illustrated in Fig. 3.1. The 
work in [18] expanded the above approach to uneven terrains. Both [18] and [26] separated 
the task of finding a geodesic over long distances into many simpler tasks of seeking geodesics 
connecting nearby points. The difficulty of “following” increases in accordance with the 
distance between the predecessor and the successor and with the complexity of the terrain. 
It is easer for an ant to aim at its leader on M 2 and move on a shortest path toward it if they 
are closer, whether on a plane or on a terrain. Same is for a robot that havs limited sensing 
range and computing ability. 
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Figure 3.2: It is easier to solve an optimization problem within a “small” region. 



Figure 3.3: Expanding the algorithm to more general optimization problems on a manifold. 

We are interested in generalizing the existing approaches in [26, 18] to a much broader 
class of optimization problems in Eq. (3.3), (3.6), using an iterative strategy requiring little 
communication as well as short-range sensing. There is an analogy here between optimal 
control problems that are easier to solve where the boundary conditions are “close” to one 
another, and members of a collective that are easier to follow from a close distance, as Fig. 
3.2 illustrates. Our idea is to seek optimal trajectories locally, by means of “local pursuit”, 
and combine the efforts of a group of agents to gradually optimize an initial solution. 

Our approach will be to propose a set of iterating rules that somehow generalize the 
idea of pursuit to settings with non-trivial geometry, and agents with non-trivial dynamics. 
If this approach succeeds, then complicated tasks could be separated into simpler tasks and 
accomplished by a group of “inexpensive” agents. The following is an algorithm that pre¬ 
scribes the evolving of a group, given an initial feasible trajectory. 

Algorithm 1 (Sampled Local Pursuit): Identify two states xq and Xf on B. Let 
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t=t k +28 


t=t k t=t k+8 



O Follower X k - X k (t) -X k (t) 

O Leader X k -i . X k -i(t) 


Figure 3.4: A snapshot of the updating processes executed by the k th agent. 

x 0 (t ) ( t G [0,T]) be an initial trajectory satisfying Eq. (3.1) with x 0 (0) = x 0 ,x 0 (T ) = Xf. 
Choose the following interval A and updating interval 8 such that 0 < 8 < A < T. Then 
follow the next rules for the k th agent. 

1. For k = 1, 2, 3 ..., let the t k = kA be the starting time of the k th agent. Let u k (t ) = 
0,x k (t) = x 0 for 0 < t < t k . 

2. When t = t k + i8, i = 0,1, 2, 3,... ; calculate u* t (r) such that f{x k (r),u* t (r )) = x* k (r), 
where 

x*(r) achieves 1 A, t, A), re[t,t + A] ifA + i8<T 

| r](x k (t), Xf,t k + T — t, t, t k + T — t), re[t,t k + T\ otherwise 

3. Apply u k (t) = ul k+iS {t—t k —i8) to the k th agent fort G [t k +i8, t k +(i+l)8) if A+i8 < T 
ortE [t k + i8, t k + T) otherwise. 

Repeat from step 2, until the k th agent reaches Xf. 

This is a “sampled” version local pursuit because agents are only required to update their 
trajectories a finite number of times. There are two adjustable parameters: the “following 
interval” A and the “updating interval” 8. Usually we take 0 < 8 < A. We will refer to the 
times t l k = t k + i8,i = 0,1, 2, 3 ... as the “updating times”. Notice that the SLP algorithm 
yields a well-defined trajectory x k (t) on [0, T], if given x k -i(t). The resulting trajectory is 
continuous but not necessarily smooth at the time interval [t k ,t k + T]. A snapshot of the 
iteratively updating processes is illustrated in Fig. 3.4. 

According to the SLP algorithm, agents leave the starting state x 0 one after another, 
each in A units of time after its predecessor. That is, if the (k — l) th agent leaves the starting 
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state at time tk-i, the k th agent will leave it at t k = tfe-i + A. We assume the number of 
agents in the group is large and label each agent by an integer k so that we can utilize x k (t) 
to denote the k th agent’s trajectory 2 . Each agent moves to pursue its predecessor. If we 
denote the (k — l) th agent as the “leader” during this pursuit relationship, the k th agent will 
be denoted as the “follower”. At each t = t l k , the follower calculates the optimal control 
u t( T ) ( T e [M + A)) that steers it from x k (t) to x k -\(t) over A units of time, i.e. from its 
current state to the leader’s current state. Then during [t k + i5,t k + (i + 1)5], the follower 
moves along the trajectory driven by w*(t), and the process repeats until the follower reaches 
x f . 

For notational convenience, we define the planned trajectories, denoted by x(t), to be 
the trajectories along which the follower plans to move at tk + id but may not do so because 
it will update its future trajectory at t k + [i + 1)5. In other words, the planned trajectories 
are the trajectory driven by uf , i 5 (r) for the time period of [t k + (i + 1)5, t k + iS + A], while 
it may not actually be executed because the next updating result, ul k +(i+i)d( T )- w ill drive 
the agent to move along different trajectories. The realized trajectories, denoted by x(t), are 
defined as the trajectories along which the follower actually moves. Referring to Fig. 3.4, 
the planned trajectories and realized trajectories are represented by the dashed lines and 
solid lines, respectively. 

If we let 5 —» 0 in SLP, we will obtain the following continuous local pursuit algorithm: 


Algorithm 2 (Continuous Local Pursuit): Identify two states xq and Xf on B. Let 

x 0 (t ) (t G [0, T]) be an initial trajectory satisfying Eq. (3.1) with x 0 (0) = x 0 ,x 0 (T) — Xf. 
Choose the following interval A such that 0 < A < T. Then follow the next rules for the k th 
agent. 

1. For k = 1, 2, 3 ..., let t k = kA be the starting time of k th agent. Let u k [t ) = 0, x k {t) = 
x 0 for0<t< t k . 


2. Calculate u%(t) for all t G [t k ,t k + T] such that f(x k (r), u* t (r)) = x* k (r), where 

x*(r) achieves! nMt),x k -i(t), A,t, A), re[t,t+ A] ift<t k + T- A 

( T)(x k {t),Xf, t k + T — t,t,t k + T — t), re[t,tfc + T] otherwise 

3. Apply u k (t ) = i^(0) to the k th agent. 

Repeat from step 2, until the k th agent reaches Xf. 


Due to the limitations of each agent’s computing capability, it might be more expedient 
to apply the sampled local pursuit (SLP) because the agents only need to update their 

2 From now on, we will utilize x k (t) to denote both the k th agent and its trajectory. 
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trajectories finite times instead of continuously in CLP. However CLP does not require 
storage of calculated results so it is more favored in situations where the update is easily to 
be carried out. 

If we are dealing with a free final time optimization problem, then the SLP and CLP 
algorithms must be altered so that agents optimized their trajectories connecting them pair¬ 
wise with respect to both u and the final time. 

Continuous local pursuit is thus altered as follows 3 . 

Algorithm 3 (Free Final Time Local Pursuit): In Algorithm 2 replace the step 2 with: 
2. ’ Calculate u^(t) for all t G [t k , t k + T] such that f(x k (r), u* t (r)) = x* k (r) , and 
x* k (r) achieves r] F (x k (t),x k -i(t),t,F) (r G [t,t + T]), where rj F is given by Eg. (3.8). 


3.3 Algorithm Advantages 

Each agent that participates in local pursuit is only required to calculate the optimal tra¬ 
jectory from itself to its nearby leader. Meanwhile the “distance” between them can be 
limited by selecting an appropriate following interval A. Therefore every agent only needs 
to sense the environment within a limited region when proceeding pursuit processes. This 
is preferable to obtaining a global map via random exploration with limited sensor range. 
For example, it would be difficult and wasteful for a single robot to obtain the entire map of 
an unknown terrain. Even if a group of agents can be dispersed and each composes a map 
“patch” around itself, it is not guaranteed that the composition of these patches covers the 
whole environment, or at least covers the region containing the optimal trajectory. 

Even if enough patches covering the entire environment have been collected, the fusion 
of a composite map still requires a large amount of information communication. A powerful 
agent is also needed to stitch the scattered maps using sophisticated fusion algorithms. 
This means at least one agent in the group has enough memory, communication bandwidth 
and computing ability to dealing with the collecting and fusing tasks concerning the entire 
environment. In contrast, in local pursuit, there is no requirement for agents to exchange 
local maps that they sense. Agents only have to communicate in very limited ways, by using 
vision to track one another or by communicating in primitive ways to signal their locations, 
e.g. sound or radio emission. 

Furthermore, even if an effective map could be obtained, solving optimization problems 

3 In SLP, it can not be guaranteed that at every updating time the minimum time to reach the leader P 
is greater than or equal to the updating interval S. If F < 5, then extra costs might be incurred. Based on 
the above consideration, we only develop the Free Final Time Local Pursuit (FFTLP) in continuous version 
so that 5 < r is guaranteed. 
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over an sophisticated environment, especially in an environment containing different kinds 
of coordinate patches, requires large amounts of calculations. The example includes finding 
geodesics on a terrain with mountains and basins. The most often used technique in such 
situations is numerical method. As we shall see later, using numerical method over long 
distances may leads to huge amounts of calculations. However, local pursuit only requires 
computing optima within small regions so that fewer calculations are needed. 

In summary, local pursuit introduces a way to obtain the locally optimal trajectory 4 
over distance by many short pieces generated via an ordered sequence of identical agents, 
meanwhile it only requires local knowledge about the environment as well as calculation 
of optimal trajectories within small regions. Thus, a complicated optimization problem 
could be solved by a group of cost-effective agents. The trade-off is that each agent must 
compute locally optimal trajectories more than once. However, the deployment of a group 
of cheap agents using local pursuit does show various advantages with cost and reliability 
consideration, if compared with achieving the same task by a single, expensive agent. 


4 This conclusion will be proved in next chapter. 
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Chapter 4 

Current Progress 


In this chapter we will investigate the collective behavior of the group involved under local 
pursuit. Recall that each algorithm defines an ordered sequence of trajectories {x k (t)}. The 
convergence of the sequence involved under SLP will be first explored. Then the limiting 
trajectory will be proved to be locally optimal, this is exactly the collective property we 
are seeking to obtain. Similar results will be explored in CLP and FFTLP. Special cases 
concerning path length and time minimizing problems will be introduced because of their 
prevalence in practice. Lastly, simulation experiments are provided to illustrate our results. 


4.1 Results on Sampled Local Pursuit 

We would like to investigate the property of the limiting trajectory generated by the group, 
i.e. x k (t) as k —► oo. The convergence of the trajectories’ cost will be explored first, then 
the convergence of trajectories themselves, {x k (t)}. After that, we show that the limiting 
trajectory of the sequence, denoted as x 00 (t), is locally optimal. 

Lemma 4.1 (Convergence of Cost in SLP): Assume a group of agents x o,x\,...,x k 
evolve under “Sampled Local Pursuit” with starting state x 0 and target state Xf. Suppose an 
initial control/trajectory pair, {u 0 (t), x 0 (f)} (t 6 [0, T]), satisfying x 0 (t) = x 0 andx 0 (T) = Xf 
is given. If the updating time satisfies 0 < 6 < A, then the cost of the iterated trajectories 
will converge, i.e. lim^oo C{x k , t k , T) exists. 

Sketch of Proof: Given an existing optimal control problem, the cost of any trajectory 
satisfying the boundary conditions is bounded below. By investigating the pursuit process 
between x k (t) and x k -\ (t) pairwise, we can prove that C(x k ,t k ,T) < C(x k -i, t k _ u T). This 
is enough to show the convergence of the sequence {C(x k , t k , T)}. See Appendix A.2 for the 
detailed proof. □ 
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Figure 4.1: Sketch of the pursuit process pairwise in SLP 


Nonetheless, the convergence of trajectories’ cost does not imply the convergence of 
the trajectories themselves. If there exist multiple locally optimal trajectories connecting 
the leader and follower at the updating times, then the convergence of trajectories is not 
guaranteed, i.e. Lemma 4.1 defines an equivalence class of trajectories with the same cost. 

If we restrict the pursuit process to take place within a “small” region by selecting A 
sufficiently small, e.g. agents follow close to one another, there will exist a unique locally 
optimal trajectory from the follower to the leader at every updating time t k + id. Thereafter 
we obtain the following result: 

Lemma 4.2 (Uniqueness of the Limiting Trajectory): If at each updating time, the 
locally optimal trajectory obtained through SLP is unique, then the limiting trajectory x m (t) 
is also unique. 

Sketch of Proof: We will show that if there exist more than one trajectories that x k [t) 
might take, for k large enough, then the cost of one trajectory must be less than the others. 
This contradicts to what we have obtained from Lemma 4.1, which shows that the limiting 
trajectories should have the same cost if they exist. See Appendix A.3 for the details of the 
proof. □ 


The locally optimal trajectories obtained at every updating time are smooth in many 
optimal control problems, e.g. the solution to the Euler-Lagrange equation in calculus of 
variations. Nonetheless, x k {t) is only known to be piecewise smooth. For example, in M 2 
with x k = u k , if the locally optimal trajectories are straight lines, x k (t) is not smooth for 
there exists a corner at the joint of two segments. However, we can show that the limiting 
trajectory is smooth in the time interval [0, T], the locally optimal trajectories obtained at 
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every updating time are smooth. The following definitions will be necessary for discussing 
the properties of the limiting trajectory. 


Definition 4.1: Let 71 (f) and 72 (f) be trajectories of Eq. (3.1), defined on a time interval 1 1 
and another time interval I 2 respectively, where h D I 2 ^0. We say that 71 and 72 overlap 
if 7 i(0 = 72 (0 /or all t eh nl 2 . 

Definition 4.2: Let 71(f) and 72(f) fe trajectories of Eq. (3.1), defined on a time interval 
h and another time interval I 2 respectively, where h H I 2 ^0. The composition 0 / 71 (f) 
and 72 (f) on the interval h U I 2 is defined as 



71 (t) t e h,t i i 2 - h n/ 2 
72 (t) t ^ fi,t 6 h _ / n i 2 


Lemma 4.3 (Smoothness of Composition): Suppose that in Lemma j.l the updating 
interval 5 and the following interval A satisfy that 0 < 5 < A, then the planned trajectory 
x(t) and realized trajectory x(t) of the limiting trajectory overlap. Furthermore, if the locally 
optimal trajectories obtained at every updating time are smooth, then the limiting trajectory 
is also smooth. 

Sketch of Proof: We will first explore that the planned trajectory and realized trajectory 
of x^ft) overlap by contradiction. Then it is shown that the limiting trajectory is piecewise 
smooth and its neighboring segments overlap, the smoothness of the limiting trajectory over 
the entire time interval is an immediate consequence. See Appendix A.4 for the details of 
the proof. j-j 

Before proceeding to the main theorem, we are required to define the following condition. 

Condition 4.1: Assume there exists an e > 0 such that for all a, bi, b 2 E B and all A > 0, 
the optimal cost rj(a, fe x , A, 0, A) from a to b\ and rj(a , b 2 , A, 0, A) from a to b 2 satisfy 


H&i — 62 1| < £ => || 7 ?(a, 61 , A, 0, A) — rj(a, b 2 , A, 0, A) || < LA 


(4.1) 


for some constants C independent of A. 

A piecewise locally optimal trajectory is not necessarily optimal. However, the composi¬ 
tion of overlapping locally optimal trajectories is locally optimal if Condition 4.1 is satisfied. 
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Lemma 4.4 (Composition of Optimal Trajectories): Let 71 (t) and 72 (t) be over¬ 
lapped locally optimal trajectories defined on a time interval I\ and another time interval / 2 
respectively, where I\ fl / 2 fid- If Condition j.l is satisfied, then the composition 71 o 72 is 
locally optimal on U / 2 . 

Sketch of Proof: Suppose that the composition (call it x*(t )) is not locally optimal, then 
there must exist another trajectory (call it x(t)) nearby such that ||x(t) — x*(£) ||oc < £ and 
C(x(t),0,T) < C(x*(t),0,T). We can then use Condition 4.1 to obtain a contradiction, 
namely that C(x(t), 0, T) > C{x*{t), 0, T). See Appendix A.5 for the complete proof. □ 

The next theorem is an immediate consequence of the above lemmas. 

Theorem 4.1 (Sampled Local Pursuit): Suppose a group of agents {x k } evolve under 
sampled local pursuit and at each updating time t = tk+iS, the locally optimal trajectory from 
x k (t) to x k -i{t) is unique. If the updating interval and following interval satisfy 0 < S < A 
and Condition f.l is satisfied, then the trajectory sequence converges to a unique local opti¬ 
mum. Furthermore, if the locally optimal trajectories at every updating time are smooth, the 
limiting trajectory is also smooth. 

Proof: From Lemma 4.2, the limiting trajectory is unique. We know that x^fi) (t G [0, A)) 
and Xw (t) (t G [4,4 + A)) are locally optimal for the realized trajectory and planned 
trajectories overlap (Lemma 4.3). The optimality of x^(t) (t G [0, 5 + A)) follows from 
Lemma 4.4. Repeating this argument on [, id , i5 + A] (i = 0 , 1, 2 . ..) leads to the result that 
Xoo(t) (t G [0, T]) is locally optimal. The proof of smoothness follows from a similar argu¬ 
ment. □ 


4.2 Results on Continuous Local Pursuit 

In the case of continuous local pursuit, the follower keeps on updating its movement at every 
t G [t k ,t k + T ], i.e. the updating interval 5 —> 0. Similar to the sampled local pursuit, we 
assume the selection of A guarantees that at every updating time there is a unique optimal 
trajectory from the follower to the leader. We will first show that a single update to the 
leader’s trajectory will result in less cost than what is incurred by the leader, no matter 
when the update occurs. Then we will explore the convergence of the trajectories’ cost in 
CLP. The remaining arguments are quiet similar to what we had discussed in SLP. 

Lemma 4.5: Let A G [0,T). Suppose that a follower replicates the leader’s trajectory on 
t G \t k , t k + A) U \t k + A + A, t k -\-T] if X <T — A (or t G \t k -\- X,t k -\- T 1 ] if X > T — A ), while 
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during [t k + A, t k + A + A] it follows the optimal trajectory joining x k (t k + A) and x k -\ (t k + A) 
in A (or (T — X)) units of time. Then the cost along the follower’s trajectory will be no 
greater than the leader’s. 

Sketch of Proof: We can investigate the cost along the follower and the leader, respec¬ 
tively. The overlapping parts of the leader’s and follower’s trajectories will lead to equal 



Figure 4.2: Sketch of a single update. 

costs, while the follower incurs less cost during [t k + A, t k + A + A]. It follows that the whole 
cost along the follower is less than the leader’s. See Appendix A.6 for the complete proof. □ 

Lemma 4.6 (Convergence of Cost in CLP): In the case of continuous local pursuit, the 
cost of the iterated trajectories converges. 

Sketch of Proof: The movement of the k th agent under CLP can be interpreted as the 
consequence of applying infinitely moving “updates” to the leader’s trajectory. From Lemma 
4.5, each update leads to non-increasing cost so that infinite times of update will also lead 
to less or equal cost for the follower. See Appendix A.7 for the details of the proof. q 


Now the main result concerning continuous local pursuit can be derived easily by an 
argument similar to what was used for sampled local pursuit. 

Theorem 4.2 (Continuous Local Pursuit): Suppose a group of agents evolve under 
continuous local pursuit and that at every updating time t, the locally optimal trajectory from 
x k (t) to x k -i(t) is unique. Then the limiting trajectory obtained is unique and locally opti¬ 
mal. It is smooth also if the locally optimal trajectories calculated at every updating time are 
smooth. 

Proof: First, we assume there are two different limiting trajectories X\(t) and x 2 (t). The 
proof of Lemma 4.2 shows that if there exist updates during the non-overlapping parts of 
successive trajectories xi(t) and x 2 (t), the whole cost along the follower will be less than the 
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leader’s, even in the case where infinite updates occur because the number of updates does 
not change this property. If there exist more than one limiting trajectories, the decrease 
of cost from the leader to the follower contradicts the fact that all the limiting trajecto¬ 
ries must have the same cost. Therefore the limiting trajectory is unique. It follows that 
Xk-i(t — A) = x k (t) if Xfc_i(f) = x aQ (t — tfe-i). If we pick a such that 0 < <5i < A, the 
limiting trajectory is piecewise smooth and locally optimal. Using the arguments in Lemma 
4.3 and Lemma 4.4 we can say Xoo(t) is smooth and locally optimal over the entire time 
interval. □ 


4.3 Results on Free Final Time Local Pursuit 

Notice that Lemma 4.5 still holds for the free final time local pursuit. The convergence of 
the trajectories’ cost is easily to obtain using the similar arguments in Lemma 4.6. Using 
the similar argument in Theorem 4.2 and changing the argument to free final time version 
will yield the following result. 

Theorem 4.3 (Free End-Time Local Pursuit): Suppose a group of agents evolve under 
free final time local pursuit and at every updating time t, the locally optimal trajectory from 
x k (t ) to x k -\(t) with free final time is unique. Then the limiting trajectory is unique and 
locally optimal, it is also smooth if the locally optimal trajectories calculated at every updating 
time are smooth. 

Proof: The proof is simple and will be omitted here. □ 


4.4 Summary 

Until now, we have seen that each algorithm (SLP, CLP, FFTLP) will generate an interesting 
“collective pattern” - the local optimum for proposed optimal control problem. Although 
each agent only solves the optimal control problem within a small region (limited by A), the 
trajectories generated by them are gradually optimized. Each agent “learns” from its prede¬ 
cessor and the limiting trajectory exhibits the collective intellect of the group. Therefore, a 
complicated task (optimizing over long distance) is separated into small tasks requiring less 
capabilities of sensing, communicating and computing. 

Our algorithms fall into the category of “learning by repetition”. Newton’s method and 
gradient methods are well-known examples in this category, and are usually applied to solve 
extremal problems in finite dimensional vector spaces [6]. Extensions of such methods in 
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function spaces also enable the development of trajectory optimization algorithms through 
repetition. For example, the work of [40] utilized a developed gradient method to iteratively 
optimize the control for a specified dynamic system . The control u(t) is derived by 


du 

dt 


d I" dW(x,t) 
a du [ dx 


X(x,u)\ 


(4.2) 


where X{x,u) = x(t) are the system dynamics and W(x,t ) is the minimal cost of reaching 
the final state Xf provided with the initial state is x(t 0 ) — x. Eq. (4.2) converges to the 
optimal control u*(t) and x*(t) if the optimal control is smooth. 

However, existing algorithms usually require the cost function and the control to be 
partial differentiable. To proceed with the above algorithm, they also need to store and 
describe the entire x k , in order to get x k+1 . Moreover, to obtain a smooth curve, infinitely 
small time increments are required so that laborious calculations are introduced. All these 
factors hinder the application of these algorithms in decentralized systems whose members 
are working cooperatively. 

In contrast, our proposed algorithms are suitable for a large class of optimization prob¬ 
lems and do not suffer from the above drawbacks. For example, our algorithms could be 
applied in the situations where the control and trajectory are not smooth such as Bang-bang 
control. The computing requirement for each agent could be limited by defining an appro¬ 
priate A. Furthermore, each agent only need very limited information of its predecessor so 
that multiple agents could work together to achieve the most effectiveness. 


4.5 Special Cases: Length and Time Minimization 

We have the additional interesting results for the trajectory optimization problems that often 
involve reaching a desired target state with minimum path length or end time. We state it 
as follows. 

Theorem 4.3 : If the time rate of the change of the cost along a trajectory is independent 
on x k (t ) for all t, then the minimum cost from the follower to the leader with free final time 
is strictly decreasing under local pursuit, unless the leader moves along a locally optimal 
trajectory. 

Proof: Let p(a,b) = J F (x*,x*,r) be the minimum cost to steer system from state a to 
another b. For the pursuit process shown in Fig. 4.3, We have that 
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Figure 4.3: The minimum cost from x k +i(t) to x k (t) is decreasing if dC/dt is independent. 


p(x k+1 (t + 5),x k (t + S)) < p{x k+1 {t + 5),x k {t)) + p{x k {t),x k {t + 5)) 
< p(x k+1 (t + S),x k (t)) + C(x k (t),t,S) 

= p(x k+1 {t + S),x k (t)) + C(x k+1 (t),t,6) 

= p(x k+1 (t),x k (t)) 


(4.3) 


If the equalities hold in Eq. (4.3) then x k {t) must be moving along an optimal trajectory. □ 


This result has a variety of applications, e.g. the minimum time control problem 

J{x,x,0,T)-T ||x||<l (4.4) 

whose solution could be obtained via the maximum principle; or the minimum path length 
problem with the condition that all agents are moving on unit speed 

J(x, x, 0, T) = T with ||x|| = 1, T is free (4.5) 


4.6 Simulations 

We now present some simulation results concerning application of local pursuit in different 
optimal control problems. 

Sampled Local Pursuit 

To illustrate the effectiveness of sampled local pursuit, we solve the minimum path length 
problem on M 2 with boundary conditions x(0) = 0, x(l) = 1. Obviously the optimal trajec¬ 
tory is a straight line. We set 5 = 0.25, A = 0.5, T — 1. Fig. 4.4 shows 5 trajectories iterated 
from sampled local pursuit. The 5 th trajectory is close to straight line. 
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Figure 4.4: Iterated trajectories of minimum length problem through SLP on M 2 

A Lagrangian Example 

Fig. 4.5 illustrates the application of continuous local pursuit in systems with drift. Here 



Figure 4.5: Iterated trajectories for the Lagrangian problem through CLP with A = 0.5 
the system dynamic is 


x(t ) + x(t) = u(t) 


and we want to minimi ze 




29 
















The locally optimal trajectory could be obtained through Euler-Lagrange equation from 
calculus of variations. The following interval A is set to be 0.5. Fig. 4.5 shows that the 
trajectory sequence converges to the optimum. 

Minimum Time Control 

Consider the following second-order system 


And we want to minimi ze the cost J(x, x, 0 ,T) — T with the boundary conditions of x(0) = 
7 r, x(T) = 0. From maximum principle it is well known that the optimal control for this 
problem is the Bang-bang control. 


u*(t) 


-1 iffe[0 ,T/2) 
1 if f e [T/2, T] 


(4.6) 


With the following interval A = 0.37T, as Fig. 4.6 illustrates, the trajectory of 6 th agent is 
essentially under optimal control, which means the convergence is really rapid. 



Figure 4.6: Iterated trajectories for minimum time control problem through FFTLP with 
A = 0.37T 


Geodesic Discovery 

Now we will show a geodesic discovery example that involves complicated calculation over 
entire environment but relatively simpler in local patches. This example simulates two hills 
by two cones. The starting state is (3500, 0, 0) and the end state is (—1300, 0, 0). The first 
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Figure 4.7: Iterated trajectories for the geodesic discovery problem through CLP. 

agent moves on a trajectory that follows along the border of the cones. The geodesic over 
large distance is not easy to compute because not only it demands knowledge over the entire 
map but also there are 4 coordinate switches along the path 1 . 

However, if we set A = 0.2T, the follower is at most required to do calculation with one 
coordinate switch so that the amount of calculation at every step is decreased, compared to 
computing over the whole map. As Fig. 4.7 illustrates, the iterated trajectories converge to 
the optimum. 


1 If applied numerical method over the entire map, the number of the time segments is 4. 
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Chapter 5 


Ongoing Work 


In this chapter we will discuss some ongoing research directions related to local pursuit, as 
listed as follows. 

• Notice that a large category of optimal control problems are which involve free final 
state or “point-to-set” problems as oppose to point boundary conditions problems. We 
would like to generalize our pursuit algorithm to such problems. 

• The limiting trajectory obtained from local pursuit may converge to a global optimum 
or a local optimum, depending on the parameters of the algorithm. We will explore 
that dependence and determine which optimum the trajectory sequence converges to. 

• The performance of local pursuit with noisy measurements will be considered due to its 
relevance in practice. We want to know whether the agents can estimate the solution 
in the absence of precise sensor readings. 

• We will explore the advantages of local pursuit in the numerical computation of optimal 
trajectories. 

Finally, we will look into the development of other biologically inspired algorithms for 
complicated tasks in engineering or other fields. The potential tasks and ongoing steps will 
be outline next. 

5.1 Optimal Control Problems with Free Final State 

Many optimal control problems with fixed final time include a penalty to the final state but 
do not impose any constraints on it, i.e. 



'to 


(5.1) 
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The example includes the LQR problems, which could be solved by introducing a feedback 
control and a Riccati equation, as we know. 

We would like to modify local pursuit to apply to this class of problems, if possible. 
Recall that in local pursuit we are gradually optimizing our initial solution, it seems that 
if the cost incurred by an agent is no greater than its predecessor, the trajectory sequence 
will converge to a local optimum. However, if agents are always moving on locally optimal 
trajectories from themselves to their predecessors, we can obtain an non-increasing trajectory 
sequence but the end point is determined by the first agent and is not the best choice. We 
should have some freedom in choosing the final state instead of fixing it by simply catching 
up the leader’s position. On the other hand, if at every updating step the follower is dealing 
with an optimal control problem with free final state, then it does not need the leader. The 
follower can determine the locally optimal trajectory only by its current state and the A, 
thus agents are totally independent. The aimlessly pursuing process will not let the follower 
“learn” from the leader and we can not guarantee the follower does better than the leader. 

Based on the above consideration, we will let the follower “catch” the leader before the 
leader reaches the final state, i.e. the follower will solve an optimal control problem with 
fixed end point during [t k ,t k + id ), where id < T — A. After the leader reaches the final 
state, the follower will solve an optimal control problem with free final state. By dividing 
the time into two different phases - “catching up” and “free running” - the follower has the 
potential of “learning” from the leader as well as choosing the best final state. The trajectory 
sequence is expected to be gradually optimized through learning while it also benefits from 
the property of free final state. 

As before, we define the cost of a segment of an optimal trajectory over [t 0 , t 0 + T] as: 



(5.2) 


where x*(t) minimize Eq. (5.1) with the restriction of x*(r) = a. We here set up an algo¬ 
rithm similar to SLP, except replacing the step 2 to: 

2. When t = t k + id,i = 0,1, 2, 3,..., calculate u* t {r) such that /(x fc (r), rtfyr)) = x* k {r), 


where 



t e [f, t + A] if A + id < T 
t G [ t , t k + T] otherwise 


If the final state is not free but restricted to a set, it should satisfy the final condition 
of 


q(x(t 0 + T)) = 0 T is free 


(5.3) 
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For example, if the final state is located on a unit circle, the condition will be ||a;(to+F') || = 1. 
From optimal control, we have known that the best final time and state could be determined 
by the transversality condition. 

We also define the cost of a segment of an optimal trajectory over [t 0 , t 0 + T] as: 

rto+T 

rjf sg (a,T,t 0 ) = Q(x(t 0 + T))+ g(x*(t),x*(t),t)dt (5.4) 

Jt 0 

where x*(t) is the optimal trajectory for the cost in Eq. (5.1) while it satisfies that x*(t 0 ) = 
a,q(x*(t 0 + T) = 0. 

Of course we need both “catching up” and ” free running” phases if applying local pursuit 
into such problems. We set up the algorithm as same as the CLP, except replacing the step 
2 to 

2. Calculate u^(r) fro all t 6 [t k ,t k + T] such that f(x k (r),ul(T)) = x* k {j), where 



r}(x k (t),x k -i(t), A,f, A), re [M + A] ift<t k + T- A 
Vfsg(x k (t),t k + T — t,t), re [t,4 + h] otherwise 


The remaining work is to prove the optimality of the limiting trajectory obtained from 
the above algorithms. We may follow the similar steps as we do with SLP and CLP: 

1. Proving the convergence of the cost incurred by the trajectories. 

2. Proving the uniqueness of the limiting trajectory. 

3. Proving the optimality of the composition of two segments of locally optimal trajecto¬ 


ries. 


4. Proving the local optimality of the limiting trajectory. 

5.2 Convergence to Global vs Local Optimum 

The limiting trajectory in local pursuit is determined by the parameters A, <5 and the initial 
trajectory x 0 (t). As we shall see, different parameters may result in reaching different local 
optima. x 0 (t ) is the initial trajectory generated by estimation or random exploration, and 
is not determined by the algorithms themselves. 

There is an obvious trade-off in choosing A: large values of A may require significant 
demands on each agent’s capabilities of sensing, communicating and computing, however, 
large A will also generally result in faster convergence and the ability of local pursuit to 
“escape” local optima. For the sake of space limitations we restrict our discussion to the 
following example 1 . 

1 In this section, all the examples are deal with problems of minimizing the path length. 
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• If pursuit takes place on a surface with “holes” or “obstacles” and the initial feasible 
path winds around the obstacles. The iterated trajectories may converge to a global 
optimum instead of a local one with large A. For example, if the A is greater than 
1/2 the perimeter of the largest circle that surrounds the holes and all agents run at 
unit speed, then the iterated trajectories converge to the global optimum. 



s 


A=3.1416 

8=3 


Figure 5.1: Larger 8 may lead to a better result. 


The 8 is much easier to be adjusted because the only requirement for 8 is that 0 < 8 < A. 
Nonetheless, there seems no simple relationship between the group’s performance and 8. 
Smaller <5 seemingly refers to more frequent updates and will bring better result, however, 
in fact it may lead to the local optimum instead of the global optimum. This can be seen 
from the following example. 

• If pursuit takes place on a plane with a hole of unit radius, and each agent moves on 
unit speed. Let the first agent move counterclockwise along one local minimum from 
S to E, as illustrated in the left of Fig. 5.1. The A is set to be 3.1416 (a little more 
than 7r). Then the simulation shows that for some 5, e.g. 5 = 2.5, all the followers 
travel along the same trajectory as the leader’s, and for some 8, e.g. 8 = 3, the limiting 
trajectory will converge to the global optimum. Here we see larger 8 leads to better 
result. The two limiting trajectories in contrast are illustrated in Fig. 5.1. 

We see that carefully selected parameters may lead to the global optimum. Directly 
determining the desired parameters that lead to desired local optimum seems difficult. How¬ 
ever, given the parameters of an algorithm and a desired local optimum, we can investigate 
whether the trajectory sequence will converge to it. We will proceed using Lyapunov’s 
method. 
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Recall that an agent’s trajectory, x k+ i (t), can be determined by the algorithm’s pa¬ 
rameters, if given its leader’s trajectory Xk(t). At every updating time, the locally optimal 
trajectory is determined by the starting state, the end state and A, so we can assume that 
the optimal trajectory minimizing the cost J is given by the mapping: 


x*(t) = h(a, b, A, r) re[t,t + A] 


(5.5) 


with the boundary condition x(t ) = a,x(t + A) = b. We assume that the mapping h : 
DxDxt + xI^Dx / (I is an time interval) defines a continuous trajectory. Given fixed A 
and 5 (assume we start with SLP), and denoting the k th trajectory as x k (t) (t e [t k , t k + T]), 
then the (k + l) th trajectory is 


x k +i(t) = 


h(x k+1 (t k+1 + j5),x k (t k+ 1 + A + jS), A, t ) 
h(x k+ i(t k+ i + i5),x k (t k + T),T — {i + 1)5, t) 


if te [t k+u t k+l + i5) 
otherwise 


(5.6) 


where the integer i satisfies i5 <T — A and (i + l)5>T — A. 
For simplicity, we can write Eq. (5.6) as 


x k +i(t) = f p (x k (t ), A, 8,t) t G [0, T\ (5.7) 

where / p :Bx [0, T] x M + xl->Dx [0,T] isa continuous function. This is similar to the 
state equation of a dynamic system, if we think every trajectory as a state in the space of 
trajectories (D x [0, T]). 

Noticing that Lyapunov’s method is a commonly applied technique in analyzing the 
convergence properties of dynamic systems, we plan to set up a Lyapunov function in the 
space of trajectories. The constructed Lyapunov should satisfy the following condition: 


V(x*(t)) = q and V(x k (t)) > q in B x [0, T] — {x*(t)} (5.8) 

V(x k+ i(t)) - V(x k (t)) < -p k < 0 for x k (t) G D x [0, T] — {x*(t)} (5.9) 


where x*(t) is the predetermined local optimum and p k —* 0 only if x k (t) —> x*(t). If we can 
hnd such a Lyapunov function, we can conclude that the trajectory sequence generated by 
local pursuit and started with an initial trajectory x 0 (t) e B x [0, T\ will converge to x*(t). 
Furthermore, by Ending a region in the space of trajectories where the Lyapunov function 
satisfies the above condition of Eq. (5.8),(5.9) and is bounded above, we are expected to 
find the region of attraction of this local optimum. 


5.3 Pursuit with Noisy Measurements 

In the real world, sensors and actuators embedded in robots are not perfect and the operation 
of them is often distorted by noise. There are a number of key points to understand the 
uncertainty [21]: 
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• Sensors only deliver uncertain values in practice. At best they deliver an approximation 
to what they are measuring. The disturbance in environments, difference between 
physical parts and measuring mechanism are also bringing unexpected errors for every 
sensor. Moreover, sensors do not deliver direct descriptions of the world. They do 
not identify the objects and separate the effects due to their own motion and objects’ 
motion. Therefore we can hardly obtain an accurate model for a real sensor. 

• Commands to actuators can have uncertain effects. Many layers of refinement may 
be performed before high level action commands become appropriate motor currents, 
each may bring uncertainty. Depending on the hardware and software accuracy, errors 
could accumulate rapidly. These uncertainties make it difficult to model actuators 
accurately. 

What we want to investigate is the collective behavior of the system when the mea¬ 
surements made by agents are subjected to noise. We would like to develop algorithms that 
work not only well but also robustly. For the sake of simplicity, we may consider the noise 
of sensors and actuators together and model the noise in a generic, abstract context as 

x(t) — x*(t) +£(x)u;(t) (5.10) 

where x*(t ) is the actually optimal trajectory, £(x) is a real valued function and uj(t') is 
a white Gaussian process with mean u and variance E 2 . What we are interested in is to 
investigate the limiting trajectory and determine its distribution. 

Another source of uncertainty comes from the estimation of optima, when precise so¬ 
lutions to locally optimal trajectories is impossible to obtain even though all measurement 
and models are perfect. For example, for an uneven terrain that can not be described by 
any existing geometric objects, it is hardly to obtain an analytical solution to the geodesics 
on it. However, sometimes we can estimate the solution with bounded error through numer¬ 
ical methods or other simple rules, by investigating properties of the environment and the 
optimal solution. The error of local estimate is related to the “following distance” between 
the leader and the follower: the smaller the distance is, more precise the estimate. So in this 
case the locally optimal trajectories that agents obtain are as follow. 

x(t) — x*(t) + e(t) ||£(t)|| 0O <£ (5-11) 

and we are interested to find the range of the limiting trajectory’s error. If the error of the 
limiting trajectory is bounded (depends on £), then we have found a method to transform 
the local trajectories’ error to the entire trajectory’s error. 

In order to proceed with this, the following steps will be considered: 
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1. Making a model of locally optimal trajectories at updating times, as Eq. (5.10),(5.11) 
did. 

2. Investigating the evolution of each pair of leader and follower under noisy measurements 
and the evolution of the trajectory’s error through the pursuit process. 

3. Finding the error of the limiting trajectory. 

5.4 Application in Numerical Computation of Optimal 
Control 

The algorithms stated here can potentially lead to advances in numerical computing of 
optimal trajectories for control systems. Numerical methods, including the Newton’s method 
and gradient methods, are commonly applied optimization methods. An obvious drawback 
of ordinary numerical methods is that they need large amount of calculation for they are 
optimizing the result iteratively. 

For example, the multiple shooting method is widely used in difficult applications, 
e.g. fuel optimization problem for spaceships [9]. Proceeding formally to multiple shoot¬ 
ing method, as Betts summarized in [8], “the fundamental idea of multiple shooting is to 
break the trajectory into shorter pieces or segments”. The time domain is broken into smaller 
intervals of the form t 0 < ti < ■ ■ ■ < t M = tf. The initial value for the dynamic variable at 
the beginning of each segment is denoted as Uj for j = 0,1,..., (M — 1) and the variable 
obtained through solving system equation from tj to t 3+ \ is denoted as i> 3 . The nonlinear 
programming (NLP) variables are defined as x = [z/ 0 , zq,..., u M - 1 ]- And the constraints for 
NLP are 

v i - i7 0 


_ <f>{vM,tf) _ 

where <!){vm, tf) = 0 is the boundary condition. The number of NLP variables and constrains 
is n = n v M where n v is the dimension of dynamic variable v and M is the number of 
segments [8] [13]. Thereafter the problem to minimize cost function F{x) can been solved 
by introducing the Lagrangian 

L(x, A) = F(x) — X T c{x) 
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Necessary conditions for the variable [a:*, A*] to be an optimum are defined by 

V x L(x, A) = 0 
V A T(z, A) = 0 

When proceeding with the iterating process, the dimension of Jacobian matrix V. x is n x n = 
n v M x n v M. Here we have seen that the number of segments involved in the calculation 
affects the “degree of labor-consumption” at least in the order of 0(n 2 ). Moreover, increasing 
the variable size will lead to more iterating steps. If fewer segments were introduced during 
calculation processes, the complexity of computing can be decreased. 

Another obvious example of “more time segments lead to increased complexity” is the 
dynamic programming. Bellman introduced the Hamilton-Jacobi-Bellman (HJB) equation 
to describe the optimal control u*(x,t ) as well as the cost-to-go function J*(x,t ) for all 
possible initial conditions [17, 14]. The HJB theory plays an important role in the field of 
optimal control because it provides sufficient condition for optimality as opposed to the nec¬ 
essary condition obtained from ordinary optimization methods [8]. However, the drawback 
of dynamic programming is the “curse of dimensionality”, as Bellman himself calls it. Even 
dealing with a moderately complicated problem will involve an enormous amount of storage 
[15]. This drawback of dynamic programming could be seen from the discrete example of 
shortest path problem [16] [17], as illustrated by the trellis diagram in Fig. 5.2. The worst 
case will involve investigating n 2 M paths and storing n v M data if proceeding backward from 
the end point to the starting point, where n v is the dimension of state x and M is the number 
of segments. Operation using dynamic programming with large M is often unfeasible due 
to the agent’s limited physical memory. 



Figure 5.2: The trellis diagram of shortest path problem. 
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We would like to utilize numerical methods with less computing complexity. One idea is 
to decrease the number of time segments in operational processes. Fewer segments mean that 
optimization processes can only be executed in smaller regions, which coincides with the idea 
of finding optima within small regions, as stated before. Therefore we plan to apply local 
pursuit in numerical methods and investigate the potential advantages that appear, such as 
the decrease of physical requirements for each agent and the “degree of labor-consumption” 
for the group. To complete the argument, we should consider the following steps: 

1. Applying local pursuit in numerical methods to solve some optimal control problems. 
Investigate a single updating process, determine the requirement for an individual 
agent to proceed the algorithm, e.g. the size of storage, the complexity of computing. 

2. Trying to find the appropriate iterative times to reach the satisfying result, e.g. to 
determine the k so that 

\\x k (t) - x*(f)||oo < £ (5.12) 

3. Investigating the requirements and computing complexity in numerical method ordi¬ 
narily applied in the same problems. 

4. Comparing the two kinds of numerical methods. 

5.5 Other Algorithms Inspired by Biology 

Besides ants, other social insects, e.g. worker honey bees, have shown us a lot of group 
activities with amazing coordinated behaviors. The intrinsic mechanism has been partly 
revealed by some effective models of such activities. We are considering ways to “borrow” 
the rules that govern behaviors of insects and to develop additional biologically inspired 
algorithms for problems in engineering. Some potential topics are as follows: 

• The foraging activities of worker honey bees [1] can provide us with some clues on solv¬ 
ing the resource allocating problems, which has numerous applications in engineering, 
economics and research operation, e.g. routing a group of taxis to pick up and deliver 
passengers whose appearances are dynamic or random, arranging a limited number of 
robots to execute several manufacturing processes. 

• The work of [3] presented a model of how ants select ongoing foraging zones. According 
to this model, each ant has the uniform distribution over all foraging zones at first. 
Assume the probability of foraging zone i at time t is Pi(t). If at time t, the ant 
finds food in zone i, then the probability P,;(t + 1) = P t (t) + min(P + , 1 — Pi(t)), where 
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P + is a constant indicating the relative importance of “learning”. If not, then the 
probability Pi(t+ 1) will be decreased. By this mechanism, both an individual ant and 
a colony of ants will evolve into optimal spatial distribution over foraging zones - getting 
maximum food when the appearance of food at each zone is random and unknown to 
the ants. In engineering, this method of “learning” is helpful, especially when there 
exist unknown factors. For example, we may want to use limited number of controllers 
to stabilize multiple plants. However, each plant has the unknown distribution of 
deviating from its equilibrium position and we want to minimize the sum of deviations. 
It is promising that we can let the controllers learn the distribution of each plant and 
develop decentralized rules for each controller. 

In order to successfully complete the proposed research, the following are specific steps 
to be taken: 

1. Finding some engineering or economic problems with similar properties to a social 
insect activity and constructing an effective model of insect activities. Many works 
have discussed models of social behavior in insects. We will stress those that appear 
to have the simplest rules. 

2. Abstracting the rules that govern communication and motion behaviors of insects and 
embedding it into the artificial collectives in order to solve the proposed tasks. 

3. Showing the effectiveness of proposed algorithm by analysis or simulation. 
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Appendix A 
Proofs 


A.l Preliminaries 

The following facts can be derived easily from the properties of optimal trajectories and are 
helpful in future argument. 

Facts : Letrj,C,r) F as defined in Eq. (3.4),(3.5),(3.8), x k (t) be a trajectory of Eq. (3.1) 
and x* {t) an optimal trajectory of Eq. (3.3) or Eq. (3.7). Then, the following properties 
hold: 

1. rj(a,b,T,to,o) < C(x k ,t 0 ,cr) with any x k (t 0 ) = x*(t 0 ), x k (t 0 + cr) = x*(t 0 + cr) where 
x*(t) satisfies Eq. (3.3). 

2. r](a, c, T, t 0 , T ) < r](a, b, cr, t 0 , a) + r]{b , c,T — a,t 0 + a,T — a) 

3. C(x k , t 0 , T) = C{x k , t 0 , a) + C(x k , t 0 + a,T -a) 

4■ r] F (a,b,t 0 ,a) < rj(a,b,T,t 0 ,cr) 

5. rj(a,b, T, t 0 , cr) = C(x*, t 0 , cr) where x*(t) satisfies Eq. (3.3). 

A.2 Proof of Lemma 4.1 

It is enough to show the cost of the iterated trajectories is non-increasing with k. Consider 
the pursuing process between the (k — l) th and k th agents. As shown in Fig. A.l, the 
dotted line, denoted by x k -i(t ) on [t fe _ 1 ,t fe _ 1 + T\, indicates the leader’s path. The solid 
lines, denoted by x k {t ), are the trajectories of the “follower”, and the dashed lines, noted 
by x k (t), are the planned trajectories, as described before. And we use x{t) to denote the 
trajectory that the follower copies from the leader’s trajectory but with a delay of time A, 
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i.e. x k {t + A) = Xk-i (t). Therefore the cost along it must be same to the cost along the 
leader’s. 



Figure A.l: Sketch of Sampled Local Pursuit 

The follower leaves the starting state at time t k , while the leader leaves it at time t k - i, 
where t k = t^-i + A. For t e [tk, t k + S], the follower moves on an optimal trajectory from 
state Xkitk) to Xk~i(tk) over A units of time. Thus from Fact 1: 

n(x k {tk),Xk-i(tk), A, t k , A) < C(x k ,t k , A) 

= C(x k -i,tk-i, A) (A.l) 

The right-hand side is the cost along the leader’s path for the first A units of time, the 
left-hand side is the optimal cost from x k (t k ) to x k _i(t k ). 

At time t k + S the follower reaches the state x k (t k + 5). Recalling that the trajectory 
drvien by ut k ( T ) i s optimal from x k [t k ) to x k ~i(t k ) and from Fact 3, we can divide the cost 
into two parts, one is actual and the other is planned 1 , i.e. 

v( x k(tk),x k -i(t k ), A, t k , A) 

= ij(x k (t k ),x k -i(t k ), A, 4, S) + r]{x k {t k + S),x k -i(t k ), A - 5,t k +5, A - S ) (A.2) 

From (A.l),(A.2): 

v( x k(t k ),x k -i(t k ), A, t k , 8) 

< C'(x fc _i,ifc_i, A) - r]{x k {t k + <5),x fc _i(f fc ), A - S,t k + 5, A - 5) (A.3) 

At time t k + 6, the follower updates its trajectory to catch the leader at its new location 
x k(t k + 5). For this trajectory is optimal from x k (t k + 6) to x k -\(t k + 5) over time A, any 

1 These two pieces are both optimal with respect to their corresponding end points. 
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path Xk(t) (t e [t k + 8,t k + 8 + A]) that is from x k (t k + 8) to Xk-i (4 + 8) over time A and 
passes through x k -i(t k ) at time 4 + A = 4 + 8 + A — 8 has equal or more cost. From Fact 
2 follows: 

V ( x k ( t k +6 ), x k -i(t k + 8),A,t k + 8, A) 

< V( x k(h + 5),x k -i{t k ), A-8,t k +8,A-8) + r](xk-i{tk), x k-i(h + 8), 8, t k + A, 8) 

< v( x k(tk + 5),x k -i(t k ), A-8,t k + 8,A-8) + C(x k , t k + A, 8) 

= v( x k(t k + S),x k -i(t k ),A- 8,t k + 8,A- 5) + C(x k -i,t k ,8) (A.4) 

We can also divide this cost into a realized part and a planned one, i.e. 

rj( x k(tk + &),x k -i(tk + 5),A,t k + 8, A) 

= r}(x k (t k + 5),x k -i(t k + 5), A, t k + 5,8) + r]{x k {t k + 28),x k -i(t k ■+■ #), A — 8, t k + 25, A — 8) 

(A.5) 

From (A.l) ~ (A.5), we obtain 
C(x k ,t k ,28) 

= ri(x k (t k ),x k ^i(t k ), A, t k , 8) + r}(x k (t k + 8),x k - X (t k + 8),A,t k + 8, 5) 

< C(x k -i, 4-i, A) + C(x k -i, t k , 5) - rj( x k(t k + 28),x k -i(t k + 8),A - 8,t k + 28,A - 8) 

= C(x k -i,t k - 1 ,A +8) - C(x k ,t k + 28,A - 8) (A.6) 

where r)(x k (t k + 28),x k -i(t k + 8), A — 8,t k + 28, A — 8) — C{x k ,t k + 28, A — 8) is from the 
fact that the planned trajectory is optimal. 



Figure A.2: First two steps in sampled local pursuit 

We repeat this procedure until t = t k + n8 where A + (n — 1)8 < T and A + n8 > T. 
This choice of n means that the leader has not reached the final state, and 

C(x k , 4, n8) = ^ v(x k {t k + iS), x fc _i(4 + i8), A, t k + i5,8) 

i =o 

< C(xk-i,tk-i,A+(n-l)8)-C(x k ,t k + n8,A-8) (A.7) 
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When £ G [tk + nS, tk+T), the leader reaches the final state and stays static. During this 
time period, no matter how many times the follower updates its movement, it will move on 
the same path that was determined at time £ = t k + n5. This path, which is indicated by the 
last solid line in Fig. A.l, is locally optimal between the states x k (t k + nS ) and x k (t k + T) 
over T — nS units of time. Therefore 

C(x k , t k + n8,T — nS) 

= v( x k(t k + n5),x k -i(t k ^i + T),T — nS, t k + nd , T - nS) 

< C(x k ,t k + nS, A — 5) + C(x k -i,t k +(n — 1)S,T — (n — 1)5 — A) (A.8) 

From (A. 7) ~ (A.8), we obtain 

C(x k ,t k ,T) < C(x k _i,t k _ 1 ,A + (n-l)5)+C(x k _i,t k + (n-l)6,T-(n-l)5-A) 

= C(x k -i, 4-i, T) (A.9) 

We have shown that cost incurred by the follower is no greater than the leader’s. Writing 
C k = C(x k ,t k ,T) in convenience, we can see that C k < C k -\. Obviously C k is bounded 
below if there exits an optimal trajectory from the starting state to the target state. Hence 
we conclude that 

lim C k = C (A. 10) 


A.3 Proof of Lemma 4.2 

Suppose there exist more than one limiting trajectory, and suppose %(£) and x 2 (t) are two 

possibilities. x\(t) differs from x 2 (t) for t G [t ,\, t 2 \ U [ha, t 4 ]_ From Lemma 4.1 these two 

trajectories must have the same cost. 

Let the leader x k _i(t) travel along X\ (t), while the follower x k (t) travels along x 2 (t). If 
no update occurs during [ti,t 2 ], x 2 (t) has less cost during [L, t 2 \ because the follower moves 
along x 2 {t) and the local optimum is unique. Same arguments on other different time periods 
lead to the face that the whole cost along x 2 (t) is less than x\(t) if no update occurs during 
t G [ti,t 2 \ U [f 3 , f 4 ]..., which contradicts to the fact that two trajectories have the same cost. 

Next, assume only one update occurs during [ti,t 2 ], as Fig. A.3 indicates. Separate the 
curves during [£i,£ 2 ] into several segments (the meaning of different curve style is the same 
as in Lemma 4.1), and indicate the cost along curve i as C*. From the uniqueness of local 
optimum, we have C1 + C5 < C 3 and C 2 < C 5 + C 4 . Hence Cj + C 2 < C3 + C4, which means 
x 2 [t) has less cost than aq(£) during [£ 1 , £ 2 ]- 

If there are multiple updates during [£ 1 , £ 2 ], we can see that the updates does not change 
the fact that cost along x 2 (t) is less than x\ (£). Hence we still get the result that the cost 
along x 2 (t) is less than x\ (£) for £ G [£i,£ 2 ], no matter how many updates occur. 
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Figure A.3: There is one update between two trajectories 


Iterating on more different time periods leads to the fact that the whole cost along x 2 (f) 
must be less than Xi(t). We also obtain contradiction. 

A.4 Proof of Lemma 4.3 

Let the leader move along the limiting trajectory x^t), suppose it is the (k — l) th agent. 
From Lemma 4.2, the limiting trajectory means that x k -i(t) = x k (t + A) for Vf £ [£*, t k + T], 
At first we claim that in the time interval [£*, + 5, t k + A], the planned trajectory agrees 
with the realized one, i.e. x k {t) = x k {t),t £ [t k + 6, t k + A], Suppose that x k (t) ^ x k (t) for 
some £ £ [t k + 8,t k + A], Because x(t) is optimal from x k (t k + 5) to x k (t k + 5 + A), the 
trajectory 



x k (t) t G [t k + 5, t k + A) 
x k (t) t £ [t k + A, + S + A] 


has less cost than the trajectory x k (t) (t G [£ fc + S, t k + 5 + A]) , which is updated by the 


follower at the time t = t k + S and is supposed to be optimal from x k (t k + S) to x k (t k + S + A). 
Thus there is a contradiction. Hence we obtain x k (t) = x k (t) for Vf <G [t k + 5,t k + A]. Same 
arguments could be applied in other time periods. 

x(t) is smooth for t <G [t k ,t k + A] because the locally optimal trajectory is smooth, 
and x k (t) is smooth for t £ [t k + 6, t k + 6 + A] (second update step) because of the same 
reason. And we know x k (t) = x k (t) for Vf £ [t k + 6, t k + A], Thus the actual trajectory 
x k (t)(t £ [ffcjffc + 25]) is smooth. Continuing on this argument leads to the result that the 
whole trajectory x k (t) (f £ [t k , t k + T)) is smooth. 
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A.5 Proof of Lemma 4.4 


We rewrite the lemma to that if x*(t) (t £ [0, ti + Ai]) and x*(t) (t £ [ti, T]) are two locally 
optimal trajectories and Condition 4.1 is satisfied, where 0 < ti < ti + Ai < T, then the 
trajectory x*(t),t £ [0, T] is a local minimum. 

We take 0 < A < Ai. From principle of optimality, we obtain that x*(t) (t £ [0, t\ + A]) 
and x*(t)(t £ [ti, T]) are two locally optimal trajectories with respect to their corresponding 
end points. 

Suppose that x*(t)(t £ [0,T]) is not the local minimum, there must exist an e < e and 
another optimum x(t) £ B x [0, T] satisfying that ||x(i) — x*(t)|| 00 < e and C(x(t),Q,T) < 
C(x*(t),0,T), as Fig. A.4 shows. 



Figure A.4: Overlapped local minimums lead to the local minimum overall 

Construct two optimal trajectories yi(t),y 2 (t),t £ [ti, t\ + A] connecting x(t) and x*(t) 
such that x*(ti) — y 2 (ti),x*(ti + A) = yxiti + A),x(fi) = yi(ti),x(ti + A) = y 2 (ti + A). 
From principle of optimality, x*(t) and x{t) (t £ [C, C + A]) are both optimal trajectories 
with respect to their corresponding end points. Now with the condition of Eq. (4.1), we 
obtain 


C(yi(t),ti, A) < C(x(t), ti, A) + jOA 

C(y 2 (t), ti, A) < C(x*(t),t u A)+CA (A.ll) 

For x*(t) (t £ [0, t\ + A]) and x*(t) (t £ [ti,T]) are two unique local optimal trajectories, we 
have 


C(x*(t), 0, £i) + C(x*(t), ti, A) < C(x(t), 0,fi) + C(yi(t),ti, A) 

C(x*(t),ti, A) + C(x*(i),l\ + A, T — ti — A) < C(x(t),t 1 + A, T — ti — A) + C(y 2 (t),ti , A) 

(A.12) 

Combining (A.ll) and (A.12) leads to 

C(x*(f),0,T) + C'(z*(i),fi,A) < C(x(t),0,T) + C(x*(t),t 1 ,A)+2£A 
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which could be derived as 


C(x*{t), 0, T) < C(x(t), 0, T) + 2£A (A.13) 

C(x(t), 0, T) is assumed to be less than C(x*(t), 0, T), but if we take 

0 C(x*(t)AT)-C(x(t)AT) 

2 C 

Therefore Eq. (A.13) can not be true. There is a contradiction because A could be set to 
be arbitrarily small. Hence follows the conclusion that x*(t ) (t £ [0, T\) must be the local 
minimum. 


A.6 Proof of Lemma 4.5 

Suppose that 4 + A + A < T. As Fig. A.5 indicated, the follower moves on the locally 
optimal trajectory x k (t)(t £ [t k + A, t k + A A]) at time t k + A. Define a function G : 
Dx[0,T]xR + ->Dx [0, T] to represent the new trajectory, denoted as G( A, x k -\(t)). The 
cost along the follower’s trajectory is 

C(x k , f fc , T) = C{x k , 4, A) + r](x k (t k + A), x fe _i(4 + A), A, t k + A, A) + C(x k , t k + A, T - A) 
< C(x k - 1 , 4-i) A) + C(x k - 1 , 4-i + A, A) + C{x k - 1 , 4-i + A + A,T — A — A) 
^- 1 ,4-i,r) (A. 14) 

Same argument could be applied for the case where 4 + A + A > T. 



A.7 Proof of Lemma 4.6 

Suppose the cost along the leader’s trajectory x k _\ (t) (t £ [4-i, 4-i + T]) is C k -\- Set up a 
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Figure A.6: Trajectory Sequence of x k [t). 

trajectory sequence x k [t) (t e [t k ,t k +T]),i = 1, 2 ... with the corresponding cost of C k . Let 
x k (t) = x k -i(t) and x k (t ) = G((i — 1 )5,x]r 1 (t)), as Fig. A.6 indicates, where G is defined in 
the proof of Lemma 4.5. 

According to Lemma 4.5, 

Cl < Cl 1 => Cl < C° k = C k —i 


with S > 0. 

Let S = T/i, then S —> 0 as i —> oo. And now the trajectory x k (t) is exactly under the 
same updating process as in the continuous local pursuit. Therefore we obtain the follower’s 
cost C k = Cl < C k ~ i- Since the sequence {C k } is non-increasing, surely it will converge to 
a limit. 
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